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Abstract 



The dynamics of interacting perceptrons is solved analytically. For a directed 
flow of information the system runs into a state which has a higher symmetry 
than the topology of the model. A symmetry breaking phase transition is 
found with increasing learning rate. In addition it is shown that a system of 
(T) \ interacting perceptrons which is trained on the history of its minority decisions 

develops a good strategy for the problem of adaptive competition known as 
JO ■ the Bar Problem or Minority Game. 

Simple models of neural networks describe a wide variety of phenomena in neurobiology 
and information theory. Neural networks are systems of elements interacting by adaptive 
couplings which are trained by a set of examples. After training they function as content 
addressable associative memory, as classifiers or as prediction algorithms. Using methods of 
statistical physics many of these phenomena have been elucidated analytically for infinitely 
large neural networks |],@]. 



Up to now, only isolated neural networks have been investigated theoretically. However, 
many phenomena in biology, social science and computer science may be modelled by a 



system of interacting adaptive algorithms. Nothing is known about general properties of 



such systems. In this Letter we present the first analytic solution of a system of interacting 
perceptrons. For simplicity, we restrict ourselves to simple perceptrons with binary output. 
The dynamics of a set of perceptrons learning from each other by a directed flow of 
information is solved analytically. Starting from a nonsymmetric initial configuration, the 
system relaxes to a final state which has a higher symmetry than the ring like flow of 
information. The system tries to stay as symmetric as possible. In some cases we find 
a phase transition: When the learning rate is increased the system suddenly breaks the 
symmetry and relaxes to a state with nonsymmetric overlaps. 
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In addition we show that a system of interacting neural networks can develop a good 
strategy for a model of adaptive competition in closed markets, the El-Farol Bar problem 
||. In this problem N agents are competing for limited resources and the individual profit 
depends on the collective behaviour. Recently a variation of this problem known as the 
Minority Game has been studied theoretically in a series of publications @-f|. The Minority 
Game model has several peculiarities: (a) The strategies of the agents are quenched random 
variables (decision tables), given in advance, and each agent can only choose between a few 
of these tables, (b) Some of the agents are frozen as losers, at least in some regions of the 
parameter space. In a realistic situation permanent losers would change the strategy after 
some time, (c) A good performance is only achieved if the number of time steps each agent 
is using for his/her decison is adjusted to the number of agents. 

Our approach shows none of these drawbacks. Each agent is using one perceptron for 
his/her decision with couplings which are trained to the minority of all the outputs. Hence 
the strategies develop according to the dynamics of the system. We analytically calculate 
the statistical properties of such a system of interacting perceptrons. The system performs 
optimally in the limit of small learning rates and is insensitive to the size of the number of 
time steps taken for the decision. Each agent receives the same profit in the long run. 

The perceptron is the simplest model of a neural network. It has one layer of synaptic 
weights w = (wi, ...,wm) and one output bit a which is given by 

M 

a = sign Y^ w i x i = sign(w • x) . (1) 

i=i 

x is the input vector of dimension M; for instance it is given by a window of a bit sequence 

St G {+1, —1}, t — 1,2, ... ,M, with Xj = (St-M+i, , ■ ■ ■ , St), or it consists of random binary 

or gaussian variables. A training example is a pair of input vector and output bit (x, a); 

a perceptron learns this example by adjusting its weights to it. Here we consider three 

well-known learning rules |T|: 

H: Hebbian learning, 

W new = W old +^X • (2) 

P: Perceptron learning: H is applied only if the example is misclassified, Wq^j • x a < 0. 
PN: Learning with normalization: After each step P the weights are normalized, w new • 

w new = 1- 

7] is the size of the learning step. In the following we mainly consider the limit of 
infinitely large networks, M —» oo, in which the learning step rj becomes a learning rate for 
a continuous presentation of examples. In this case we can use the analytical methods for 
on-line training which are well developed |2||],|IIJ. In this Letter we study a system of N 
perceptrons with weight vectors w 1 , w 2 , . . . w" which are trained by a common input vector 
x and their mutual output bits a 1 , . . . , a N . 



We consider a set of N interacting perceptrons with a directed cyclic flow of information. 
At each training step all of the networks receive the same randomly chosen input vector 
x. Now perceptron w 1 learn the output from w 2 , perceptron w 2 learns from w 3 , . . ., 
perceptron w^ learns from perceptron w 1 . Our analytical and numerical calculations give 
the following result: Starting from random initial weight vectors with length Wo = |wo| and 
using perceptron learning rule P for each of the networks, the system runs into a state of 
complete symmetry with identical overlaps w ! • w- 7 for all pairs (i,j). The stationary state 
is given by the equation 



r] Oyjl + (N -1) cos 9 = V2tt w (1 - cos 9) , (3) 

where 9 is the common mutual angle between all weight vectors. Fig. [l| shows the result. For 
small learning rate all perceptrons agree with each other, their mutual angle 9 is close to zero. 
With increasing learning rate the angle increases to its maximal possible value. The sum of 
the weight vectors Y^L\ w * is constant under this learning rule, because for every perceptron 
that learns the pattern with a positive sign there is a subsequent neighbour that learns it 
with a negative sign. For 77 — > 00 the norm of this sum is negligible compared to |w*|, and 
the vectors form a hypertetrahedron which gives cos 9 = —1/(N — 1). Note that the final 
stationary state has a higher symmetry than the ring flow of information. This symmetry 
seems to be robust to details of the model: in simulations where each perceptron had a 
different quenched learning rate, all the angles between the perceptrons again converged to 
the same value. 

The effective repulsion between the weight vectors can be understood geometrically in 
the case of two perceptrons: the sum w 1 + w 2 is conserved; the fixed point results from 
an equilibrium between learning the component of x parallel to the w 1 -w 2 -plane (which 
decreases 9) and the component perpendicular to this plane (which increases 9). 

The symmetric behaviour turns out to be different with learning rule PN, where all the 
weight vectors w 1 remain on a sphere |w l | = 1. For small learning rate the system runs into 
a symmetric state given by 

7] 9 = \/2tt(1 -cos 9) (4) 

(compare to Eqn. |^). However, this equation can only be geometrically realized up to a 
critical value rj c (N), where the hypertetrahedron configuration is reached and the sum of 
the w ! vanishes. In the case of two perceptrons, geometrical constraints do not play a role; 
however, there is a maximal 7] c ^ = 1.82, above which no solution of Eqn. [| exists. For larger 
learning rates 77 > 1] C (N) our numerical simulations give the following results as shown in 
Fig. |: 

• For N = 2, there is a discontinuous transition to cos# = —1 at the mentioned r] c ^- 



• For N = 3 the state remains in the triangular configuration cos 6 = —1/2. 

• For N > 3 the symmetry is broken spontaneously. The angle 9ij between perceptrons 
i and j now depends on their distance on the ring. However, the symmetry of the ring 
is still conserved. This means, for instance, that for N = 7, #13 is the same as #24 and 
#35, but there are three different values of mutual angles #12, #13 and 6 I 14 . In general, 
there are now N/2 different angles for even N and (N — l)/2 angles for odd N. Since 
the perceptrons try to increase the angle to their nearest neighbour, the angle to more 
distant perceptrons has to increase to satisfy geometric constraints. 

• For even values N > 4 we observe an additional discontinuous transition to pairing: 
Two subsets are formed with antiparallel alignment between the subsets. This fixed 
point is probably unstable in the M — > 00 limit and only observed in simulations 
because the self-averaging property of the ODEs breaks down at that point. 

Hence, with increasing learning rate the symmetry of the system of interacting perceptrons 
is broken, but the state still has the symmetry of the ring. Note that according to Eq. (H) 
the learning step scales to zero with system size M. The prefactor alone triggers the first 
phase transition. 

Now we show that a system of interacting networks can show better-than-random per- 
formance in a problem of adaptive competition which was recently introduced by Arthur 
H and is being studied intensively |§-f|. It is a model of a closed market where N agents 
are competing for limited resources and where the individual profit depends on the action 
of the whole community. 

The model consists of N agents who at each time step have to choose between actions 
a 1 = +1 or a % = — 1, i = 1, . . . , N. The profit of each agent depends on the minority decision; 
each agent gains g % = +1 if he belongs to the minority, and he pays +1 if he belongs to the 
majority of the common decision. Hence, one has g % = — <J l sign(Y,f = i ^)- The global profit 
is given by 



A' 



G = T.9 l 
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the cashier always makes profit. Now each agent uses an algorithm which should max- 
imize his profit. In this model agents know only the history of the minority sign St = 
— sign(X/^=i cri) f° r eacn previous time step t, and the agents are not allowed to exchange 
informations. 

If each agent makes a random decision cr, , the mean square global loss is 

(G 2 ) =N . (5) 



It is non trivial to find an algorithm which performs better than ([|). Previous investigations 
studied algorithms where each agent has two or more quenched random tables that prescribe 
decision for each of the 2 M possible histories x t = (S t -M+i, ■ ■ ■ St)- Each table receives a 
score, and the one with the larger score is chosen. 

Here we introduce an approach where each agent uses the same dynamic strategy. We use 
a perceptron with adaptive weights for each agent to make the decision. The weights define 
the strategy of the agent, and our strategies change with time as the weights are updated 
according to the minority decision one time step earlier. We follow the usual scenario for 
training a perceptron: Start from a randomly chosen set of initial weights and train each 
network by the usual Hebbian learning rule. At each time its decision of each agent is made 
by Oi = sign(w* ■ x) and each perceptron is trained by the minority decision St, 

■ V l N \ 

w m = w l - -y x *sig n I H si S n ( x ' w ^) I • ( 6 ) 

Hence, the bit sequence (S t ) is generated by the negative output of a committee machine. 
From Eq. ^| it follows that each weight vector is changed by the same increment, hence only 
the center of mass of the weight vectors changes during the learning process. 

Our numerical calculations show that starting from a random set of weight vectors and 
initial input the systems relaxes to a state with a good performance. The global gain is of 
the order of iV and for small learning rates the system performs better than the random 
guessing. We succeeded to solve the dynamics of the interacting networks analytically for 
the case where the input vector is replaced by a random one. 

Approximating the input x by a random one, we derived the equation of motion of the 
norm of the center of mass; the fixed point describes the global gain in the long run. To 
simplify the calculation, the initial norms |wq| are set to 1, the sum J2i w o i s 0> an d the 
scalar products are symmetric: Wq • wj = —1/(N — 1) for i ^ j. We obtain for the attractor 
of the dynamics 

(G')/N = 1 + (JV - 1) ( 1 - £ arccos - ' ](f~ 1] ) J (7) 




A=^\l + \l + ^^-\ ■ W 



For random patterns, A is the square norm of the center of mass at the fixed point. Eq. (|7|) 
agrees with simulations of both the real time series and random patterns, as shown in Fig. 
|3|. Very similar results (up to factors of 1 + 1/y/N) are found analytically and in simulations 
by starting with uncorrelated random vectors. For small learning rate r\ — > we obtain the 
best global gain 

(G 2 ) = (l - -W ~ 0.363V . (9) 



It is interesting that this result is also obtained for a scenario where we use a distribution of 
learning rates r\ instead of a fixed one. For every perceptron at every time step a different 
learning rate is chosen. Hence the center of mass is not fixed during learning and the weight 
vectors increase their lengths similar to a random walk. This process decreases the average 
learning rate compared to the length and leads to the performance given in Eq. flj]). 

Hence, the system of interacting networks performs better than the random decision. 
In fact, there are several advantages of the system of neural networks compared to the 
algorithm of scoring quenched random tables. 

Firstly, the size M of the history does not have to be adjusted to the number of agents 
in order to perform better than random. Our analysis implicitly assumes that N < M and 
both M and N are large, but simulations show good qualitative agreement even for N = 21, 
M — 4. For small M, {G 2 )/N even tends to be smaller than predicted for M = oo. We 
suspect that a strong dependence on the ratio of players to possible strategies only occurs 
when players have to pick from a set of fixed strategies, and is absent when they fine-tune 
one strategy. However, this point still needs further investigation. 

Secondly, on average all of the agents perform identically - this is also a consequence of 
the absence of quenched disorder. There is no phase transition between a set of successful 
agents and losers, as found in Ref. || for the random tables. This is clear from the geo- 
metrical interpretion: The center of mass does a random walk on a hypersphere around the 
origin. The radius depends on the learning rate; if the radius is smaller than yN (obtained 
from when adding up iV random vectors of norm 1), the "strategies" are distributed better 
than random. As the center of mass moves, each perceptron shifts from the current majority 
side to the minority side and back. 

Eq. (£5p represents the optimum obtainable for perceptrons as long as the symmetry 
among them is not broken. It would be interesting to study other network architectures to 
see whether the profit of a system of competing neural networks can still be improved. 

This work benefitted from a seminar at the Max-Planck Institut fur Physik komplexer 
Systeme, Dresden. The authors want to thank Michael Biehl and Georg Reents for useful 
discussions, and Andreas Engel and Johannes Berg for their introduction to the minority 
game. 
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FIG. 1. Fixed point of cyclic learning with alg. P: simulations with M = 100 for 2 to 5 
perceptrons and solutions of Eqn. 0. 9 is the common mutaual angle between all weight vectors. 
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FIG. 2. Fixed points of cos# in cyclic learning using rule PN for 2, 3, 4, and 5 perceptrons, 
respectively, in simulations with M = 100. For r/ < r/ c all weight vectors have a common mutual 
angle 9. For rj > rj c (N) and a ring with more than 3 perceptrons, the symmetry is broken, and the 
angle 9ij depends on the distance between perceptrons i and j on the ring. 
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FIG. 3. Average loss (G 2 )/N versus learning rate in the Bar Problem, using learning rule H. 
Simulations used M = 100. 



